packages <- c("raster", "maps", "mapview", "tidyverse", "viridis", "sp", "rasterVis")
#lapply(packages, install.packages)
lapply(packages, library, character.only = T)

Visualization with ggplot2

First, let’s grab some data that we can explore visually!

DATA <- read.csv("Ticks.csv", header = T)

TICKS!!! This is part of a dataset looking at the association among fire ants, ticks, small mammals, and tick-borne diseases (Decreased small mammal and on-host tick abundance in association with invasive red imported fire ants (Solenopsis invicta), Castellanos et al. 2016).

It is generally always good practice to take a look at your data beforehand to see what you are dealing with (check dimensions, see what the first few rows look like, and check out a summary of the data and what class each column is).

dim(DATA)
## [1] 286  21
head(DATA)
##      Date Year Season PlotID      LabelID Transect Treatment TrapID Weight
## 1 10/2013 2013   Fall    GRR GRR 13102001       T2   Treated     A3   26.0
## 2 10/2013 2013   Fall    GRR GRR 13102002       T2   Treated    C13   11.0
## 3 11/2013 2013   Fall    GRR GRR 13111601       T2   Treated     C2   10.0
## 4 11/2013 2013   Fall    GRR GRR 13111602       T2   Treated    C11  196.0
## 5 11/2013 2013   Fall    GRR GRR 13111603       T2   Treated    C20    9.0
## 6 11/2013 2013   Fall    GRR GRR 13111604      UT1 Untreated    C19   31.5
##               ScientificName Sex Recapture NDRecapture DateLastCaught
## 1       Chaetodipus hispidus   M         N           N               
## 2 Reithrodontomys fulvescens   M         N           N               
## 3            Baiomys taylori   F         N           N               
## 4          Sigmodon hispidus   F         N           N               
## 5 Reithrodontomys fulvescens   M         N           N               
## 6       Chaetodipus hispidus   M         N           N               
##   EarTagNumber LREarTag TickPresence TotalTicks         TickSpecies
## 1          172        L            0          0                    
## 2          173        L            1          5 Amblyomma maculatum
## 3           NA                     0          0                    
## 4           31        R            0          0                    
## 5           32        L            0          0                    
## 6           33        L            0          0                    
##   Life.Stage Mortality
## 1                    N
## 2         LL         N
## 3                    N
## 4                    N
## 5                    N
## 6                    N
summary(DATA)
##       Date         Year         Season    PlotID           LabelID   
##  2/2014 :72   Min.   :2013   Fall  : 12   GRR:286              : 15  
##  4/2014 :43   1st Qu.:2014   Spring: 99             GR 14040501:  1  
##  7/2014 :40   Median :2014   Summer: 65             GR 14040502:  1  
##  12/2013:38   Mean   :2014   Winter:110             GR 14040503:  1  
##  3/2014 :35   3rd Qu.:2014                          GR 14040504:  1  
##  6/2014 :25   Max.   :2014                          GR 14040505:  1  
##  (Other):33                                         (Other)    :266  
##  Transect      Treatment       TrapID        Weight      
##  T1 : 90   Treated  :206   B2     : 10   Min.   :  2.00  
##  T2 :116   Untreated: 80   C15    :  9   1st Qu.:  7.00  
##  UT1: 53                   A2     :  8   Median :  8.75  
##  UT2: 27                   B9     :  8   Mean   : 14.75  
##                            C11    :  8   3rd Qu.: 12.00  
##                            C12    :  8   Max.   :197.50  
##                            (Other):235   NA's   :18      
##                     ScientificName   Sex      Recapture NDRecapture
##  Baiomys taylori           :163        : 21   N :248    N:273      
##  Chaetodipus hispidus      : 16    F   :125   Y : 25    Y: 13      
##  Cryptotis parva           :  9    M   :138   Y?: 13               
##  Perognathus merriami      :  1    NA's:  2                        
##  Peromyscus leucopus       : 11                                    
##  Reithrodontomys fulvescens: 75                                    
##  Sigmodon hispidus         : 11                                    
##   DateLastCaught  EarTagNumber    LREarTag    TickPresence    
##          :263    Min.   :  1.00       :162   Min.   :0.00000  
##  2/9/14  :  8    1st Qu.: 43.75   L   : 55   1st Qu.:0.00000  
##  6/4/14  :  3    Median : 62.50   R   : 65   Median :0.00000  
##  11/16/13:  2    Mean   : 69.28   NA's:  4   Mean   :0.05594  
##  12/5/13 :  2    3rd Qu.: 94.50              3rd Qu.:0.00000  
##  2/10/14 :  2    Max.   :173.00              Max.   :1.00000  
##  (Other) :  6    NA's   :166                                  
##    TotalTicks                  TickSpecies  Life.Stage Mortality
##  Min.   :0.0000                      :270     :270     N:278    
##  1st Qu.:0.0000   Amblyomma maculatum: 16   LL: 12     Y:  8    
##  Median :0.0000                             NN:  4              
##  Mean   :0.1259                                                 
##  3rd Qu.:0.0000                                                 
##  Max.   :9.0000                                                 
## 

Each row is a mammal that was processed for ticks.

ggplot2 has two general methods of plotting: qplot and ggplot

plot(DATA$ScientificName, DATA$Weight)

This is using the base::plot function.

qplot(data = DATA, x = ScientificName, y = Weight, geom = "boxplot")

ggplot2::qplot is ggplot2’s version of a quick plot, hence the name, coding denizens are lazy (see: every package that begins with “r”). You specify the data.frame (data = ), the x axis (x = ), the y axis (y = ), and the type of plot you want (geom = ). A quick note. ggplot2 only really likes data.frames. Don’t give the data argument a matrix and cry when it spits out a totally non nebulous error.

Now, I’d like to introduce you to my good frenemy, ggplot.

ggplot(data = DATA, aes(x = ScientificName, y = Weight)) + geom_boxplot()

The plot that results is the same as from qplot, but the way we got there is different (unless you haven’t noticed the ggplot2::aes function in the room). ggplot2::aes tells the plot the aesthetics (x and y axes, color, fill, groups, size, shape, etc.) that are used in other layers. After that, you add the plot in layers called by a geom_XXXX function.

ggplot(data = DATA, aes(x = Treatment, y = Weight, fill = ScientificName)) + geom_boxplot()

By just adding a fill aesthetic, we separated out each species’ weight by scientific name.

What else can you do? What mammals did we find?

BAR <- ggplot(data = DATA, aes(x = Transect, fill = ScientificName)) + geom_bar(col = "black")
BAR

Here, I just wanted to see what mammal species (fill = ScientificName) were found in each transect (x = Transect) and how many. I’m sure y’all are absolutely shocked that geom_bar brings up a bar plot.

“But Adrian,” you say. “Aren’t species names in italics?!” Correct, and ggplot2 fun shouldn’t come at the expense of taxonomic correctness.

BAR + theme(legend.text = element_text(face = "italic"))

Most of the minor changes to a plot, its axes, the legend, etc. are confined to the ggplot2::theme function. Go look at theme reference and weep internally at the all of the arguments listed for the function. Here, we are concerned with just the legend text (or legend.text, if you will). All adjustments to text items are done using the ggplot2::element_text function (P.S. if seeing text makes you squeamish, substitute element_text with element_blank()).

Alright, I’m going to do something horrible and unleash piping and dplyr.

DATA %>% group_by(Treatment, Date) %>%
    summarize(n = n()) %>%
    ggplot(aes(x = Treatment, y = n)) + geom_boxplot()

So the pipe symbol (%>%) takes whatever the result is from the left and places it as the first argument in the function on the right. You can take a look at it bit by bit (after each %>%). Overall, it makes the code a lot more readable.

Otherwise summarize(group_by(DATA, Treatment, Date), n = n()) is the other alternative just to get before the ggplot function.

Need another plot example? This time we’ll look at a batch of jittered points.

ggplot(data = DATA, aes(x = ScientificName, y = Weight, fill = Treatment)) + geom_jitter(height = 0, width = 0.25)

Oh no! What happened?! Well, clearly what happened is that you are using the wrong symbol. Obviously. If you don’t want this to happen, you can also change “fill” to “color.” There’s a bunch of handy images on the ever magical Google image search for “pch r” that show what part of symbol is controlled by color and what part is controlled by fill.

ggplot(data = DATA, aes(x = ScientificName, y = Weight, fill = Treatment)) + geom_jitter(height = 0, width = 0.25, pch = 21)

You’ll notice that I added a few things to the geom_jitter function. I specified no jitter to the height of a point, capped the jitter on the x axis, and changed the symbol of the points using pch.

But now I want to do something slightly more complicated by giving a line showing the mean weight of each group (species in treatment type). To do this, we will use the ggplot2::stat_summary function.

ggplot(data = DATA, aes(x = ScientificName, y = Weight, fill = Treatment)) + geom_jitter(height = 0, width = 0.25, pch = 21) + stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.., col = Treatment), width = 0.7, size = 0.75)

This function will summarize our data and plot it out depending on what we specify in the geom argument (can be a point, errorbar, crossbar, etc.). We want the mean of the y axis (hence the use of the fun.y argument) and only want a simple line, not an entire errorbar. To do this, we set the ymin and ymax to the same value as y (the ..y.. simply refers to the calculated value of y by the function).

This is the first clear evidence you have seen so far that ggplot2 works as a series of layers (don’t beleive me? put stat_summary in front of geom_jitter).

You can specify your own color palette in ggplot2::scale_fill_manual by giving the values argument a vector of colors.

COLS <- c("#009E73", "#CC79A7", "#D55E00", "#0072B2", "#F0E442", "#E69F00","#56B4E9") #colorblind friendly palette from http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/
BAR + scale_fill_manual(name = "Species", values = COLS)

The name argument also allows you to change the title of the legend.

The ggplot2::labs function allows you to change the x and y axis titles and the main title and the use of \n in a string tells ggplot2 to make a hard return.

BAR + labs(y = "No. Individuals Caught", title = "Goliad\nMammal Captures")

BAR + scale_x_discrete(limits = c("T1", "UT1", "T2", "UT2"), labels = c("Treated 1", "Untreated 1", "Treated 2", "Untreated 2")) + scale_y_continuous(breaks = seq(0, 200, 15))

The scale functions are useful for modifying the scale however you want. They follow the pattern of scale_“axis in question”_“type of data.” Our x axis has discrete data (factors relating to each transect) while our y axis is continuous (number of captures). You can use the limits argument to reorder things and the labels argument to rename them.

You can also specify coordinates (x, y) to tell the legend where to be.

BAR + theme(legend.position = "bottom")

BAR + theme(legend.position = c(0.8, 0.75), legend.background = element_blank())

But what if you don’t like that traditional gray panel background (there are OPINIONS about it)? You can change it by giving it the ggplot2::element_rect function and telling it to change the fill (or by giving it element_blank()). You can also change the color of the grid lines by specifying a color argument in element_line.

BAR + theme(panel.background = element_rect(fill = "white"), panel.grid = element_line(color = "grey70"))

Do you hate Adobe Illustrator or similar programs for making figures because you forget everything each time you make a figure? Let’s see how to combine plots because hopefully things havent gone off the rails by now.

library(cowplot)
BAR 

See what I mean by people having OPINIONS on how everything should look

BAR <- BAR + scale_y_continuous(expand = c(0, 0)) #this has been bothering me while writing all of this, so I'm finally fixing it (got rid of the space between the x axis and 0 by using the expand argument in a scale function)
SPEC <- c("B. taylori", "C. hispidus", "C. parva", "P. merriami", "P. leucopus", "R. fulvescens", "S. hispidus")
PLT <- ggplot(data = DATA, aes(x = ScientificName, y = Weight, fill = Treatment)) + geom_jitter(height = 0, width = 0.25, pch = 21) + stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.., col = Treatment), width = 0.7, size = 0.75) + scale_x_discrete(labels = SPEC) + theme(axis.text.x = element_text(size = 10, angle = 75, vjust = 0.5))
PLT

plot_grid(BAR, PLT, labels = c("A", "B"), align = "h")

The cowplot::plot_grid function will put your multiple plot objects in the same plot and can label them with the labels argument.

ggsave("HorribleAwfulPlot.pdf")

By default, ggplot2::ggsave saves the most recent plot you brought up (you can also give it a plot object in the plot argument) at the dimensions that it occurs. You can adjust dpi, width, and height in here as well if need be.

Check ?ggsave

No, for real. The help pages in R are invaluable. Most questions that people ask me are either solvable by taking the time to read the function help page or googling an error code (or things being the wrong class).

Introduction to spatial data

package::sp

We will be introducing three types of data today, how they differ, and how to manipulate and visualize them.

Point data

BRTC <- read.csv("BRTCMammals.csv", header = T) 
dim(BRTC)
## [1] 62415    15
summary(BRTC)
##                      name          orderKey      familyKey      
##  Peromyscus maniculatus: 3282   Min.   : 731   Min.   :   5297  
##  Tadarida brasiliensis : 3115   1st Qu.: 734   1st Qu.:   9366  
##  Peromyscus leucopus   : 2130   Median :1459   Median :   9437  
##  Artibeus jamaicensis  : 2019   Mean   :1109   Mean   :1155693  
##  Sigmodon hispidus     : 1845   3rd Qu.:1459   3rd Qu.:3240723  
##  Peromyscus levipes    : 1541   Max.   :1494   Max.   :7456382  
##  (Other)               :48483                                   
##     stateProvince        year                country       recordNumber  
##  Texas     :18054   Min.   :1929    United States:28364   1      :  319  
##  Chiapas   : 2866   1st Qu.:1984    Mexico       :17733   2      :  304  
##  Tamaulipas: 1720   Median :1991    Honduras     : 5417   4      :  284  
##  Veracruz  : 1540   Mean   :1991    Canada       : 1680   3      :  278  
##  Guerrero  : 1409   3rd Qu.:1995    Guatemala    : 1628   5      :  272  
##  (Other)   :35661   Max.   :2015    (Other)      : 7515   (Other):58570  
##  NA's      : 1165   NA's   :58150   NA's         :   78   NA's   : 2388  
##        county                                  locality    
##  Brewster : 1886   No specific locality            :  661  
##  Brazos   : 1193   7.4 mi SSW San Lorenzo          :  365  
##  Hill     :  682   Lancetilla, 40 m                :  305  
##  Tyler    :  587   10.8 mi S, 2.6 mi W Jicaro Galan:  301  
##  Culberson:  584   9.5 mi NW Melaque               :  299  
##  (Other)  :22855   (Other)                         :60089  
##  NA's     :34628   NA's                            :  395  
##                           preparations   catalogNumber   institutionCode
##  skin | skull                   :35050   1      :    1   TCWC:62415     
##  alcoholic                      : 9092   10     :    1                  
##  skeleton                       : 5253   100    :    1                  
##  skull                          : 5107   1000   :    1                  
##  skin in alcohol | body skeleton: 3336   10000  :    1                  
##  (Other)                        : 4554   10001  :    1                  
##  NA's                           :   23   (Other):62409                  
##  decimalLongitude  decimalLatitude  geodeticDatum
##  Min.   :-148.21   Min.   :-28.42   WGS84:15475  
##  1st Qu.:-104.56   1st Qu.: 29.93   NA's :46940  
##  Median : -98.70   Median : 30.88                
##  Mean   :-100.13   Mean   : 32.64                
##  3rd Qu.: -96.30   3rd Qu.: 34.95                
##  Max.   :  61.81   Max.   : 66.40                
##  NA's   :46940     NA's   :46940

These data are records from the Biodiversity Research and Teach Collection’s mammal collection here at TAMU gathered from GBIF.

BRTC %>% filter(stateProvince == "Texas") %$%
     county %>% 
     unique %>% 
     length
## [1] 244

I wanted to see how many counties in Texas are represented by specimens in the BRTC. Using a pipeline (hissing from back of the room at the sight of %>%) is easier to read than length(unique(BRTC[BRTC$stateProvince %in% "Texas", "county"])) by breaking it down into steps and allowing you to read from left to right.

You may have also noticed the %$% symbol which is used to index similar to how BRTC$county would be.

But we want it in an easy spatial representation. Additionally, the sp package really dislikes NA values (which we have a ton of unfortunately), so let’s get rid of those too.

TEX <- BRTC %>% filter(stateProvince == "Texas" & !is.na(decimalLongitude))
TEX <- SpatialPointsDataFrame(coords = TEX[, 13:14], data = TEX, proj4string = crs("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))

The sp::SpatialPointsDataFrame requires coordinates (decimalLongitude and decimalLatitude), data to be placed in the “DataFrame” portion of the object (TEX), and a coordinate reference system (all points from GBIF are WGS84).

Spatial objects from sp have a few different pieces to them that you can query via @ (this is an S4 object rather than an S3, so the usual $ doesn’t work at these upper levels) A few options are TEX@bbox (min-max coords), TEX@proj4string (CRS nonsense), TEX@coords (coordinates for each point), TEX@data (all of the rest of the data).

head(TEX)
##               name orderKey familyKey stateProvince year       country
## 1  Cryptotis parva      803      5534         Texas 2013 United States
## 2 Geomys attwateri     1459      9437         Texas 2011 United States
## 3 Geomys attwateri     1459      9437         Texas 2011 United States
## 4 Geomys breviceps     1459      9437         Texas 2011 United States
## 5 Geomys breviceps     1459      9437         Texas 2011 United States
## 6 Geomys breviceps     1459      9437         Texas 2011 United States
##   recordNumber   county                     locality
## 1            1  Wharton                     TT Ranch
## 2          317 Burleson       Caldwell, 4251 FM 2001
## 3          316 Burleson       Caldwell, 4251 FM 2000
## 4           24   Brazos 7466 Raymond Stotzer Parkway
## 5          331   Brazos                  7466 Hwy 60
## 6           21   Brazos     TAMU Sheep Center, Bryan
##                      preparations catalogNumber institutionCode
## 1                       alcoholic         62533            TCWC
## 2 skin in alcohol | body skeleton         61912            TCWC
## 3 skin in alcohol | body skeleton         61911            TCWC
## 4 skin in alcohol | body skeleton         60888            TCWC
## 5 skin in alcohol | body skeleton         60858            TCWC
## 6 skin in alcohol | body skeleton         60860            TCWC
##   decimalLongitude decimalLatitude geodeticDatum
## 1        -95.96709        29.33492         WGS84
## 2        -96.68737        30.61135         WGS84
## 3        -96.68737        30.61135         WGS84
## 4        -96.40987        30.56515         WGS84
## 5        -96.56444        30.80250         WGS84
## 6        -96.40913        30.56267         WGS84
summary(TEX)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                        min      max
## decimalLongitude -106.6309 61.80891
## decimalLatitude    20.5588 66.39965
## Is projected: FALSE 
## proj4string :
## [+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0]
## Number of points: 10101
## Data attributes:
##                          name         orderKey      familyKey      
##  Sigmodon hispidus         :1022   Min.   : 731   Min.   :   5298  
##  Peromyscus leucopus       : 844   1st Qu.:1459   1st Qu.:   9368  
##  Peromyscus gossypinus     : 625   Median :1459   Median :   9679  
##  Geomys breviceps          : 559   Mean   :1306   Mean   :1564320  
##  Reithrodontomys fulvescens: 418   3rd Qu.:1459   3rd Qu.:3240723  
##  Peromyscus maniculatus    : 310   Max.   :1459   Max.   :3240723  
##  (Other)                   :6323                                   
##          stateProvince        year               country     
##  Texas          :10101   Min.   :1936   United States:10101  
##  Aguascalientes :    0   1st Qu.:1989   Argentina    :    0  
##  Ahuachapán     :    0   Median :2003   Australia    :    0  
##  Al Hillah, Liwa:    0   Mean   :1997   Azerbaijan   :    0  
##  Alabama        :    0   3rd Qu.:2004   Belize       :    0  
##  Alajuela       :    0   Max.   :2013   Bermuda      :    0  
##  (Other)        :    0   NA's   :9696   (Other)      :    0  
##   recordNumber       county    
##  5      : 135   Hill    : 661  
##  4      : 134   Brazos  : 559  
##  1      : 133   Tyler   : 542  
##  3      : 125   Hardin  : 499  
##  2      : 122   Presidio: 497  
##  (Other):9082   Kerr    : 484  
##  NA's   : 370   (Other) :6859  
##                                       locality   
##  14 mi SW Hunt, 2200 ft                   : 197  
##  8 mi NE Candelaria                       : 166  
##  7 mi S, 1 mi E Port Lavaca               : 164  
##  18 mi N, 4.5 mi E Oilton, Gates Lake Area: 132  
##  6.5 mi N, 6.5 mi W Raymondville          : 120  
##  7 mi N, 4.5 mi W Raymondville            : 104  
##  (Other)                                  :9218  
##                           preparations  catalogNumber   institutionCode
##  skin | skull                   :5701   1015   :    1   TCWC:10101     
##  skeleton                       :1734   1016   :    1                  
##  skull                          :1185   1017   :    1                  
##  alcoholic                      : 601   1018   :    1                  
##  skin in alcohol | body skeleton: 394   1027   :    1                  
##  (Other)                        : 481   1028   :    1                  
##  NA's                           :   5   (Other):10095                  
##  decimalLongitude  decimalLatitude geodeticDatum
##  Min.   :-106.63   Min.   :20.56   WGS84:10101  
##  1st Qu.: -99.57   1st Qu.:29.32                
##  Median : -97.61   Median :30.31                
##  Mean   : -98.18   Mean   :30.20                
##  3rd Qu.: -96.20   3rd Qu.:31.02                
##  Max.   :  61.81   Max.   :66.40                
## 

As you can see, it acts pretty much like a regular data.frame would except for being high maintenance when you want something particular from it.

The good thing about these spatial classes is that they are easier to plot

plot(TEX)

#Well...looks like there are some points that aren't georeferenced correctly. Let's toss those out.

Polygons

To get rid of points let’s introduce polygons in R.

CNTY <- raster::getData(country = "USA", level = 2) %>% .[.$NAME_1 %in% "Texas", ]

raster::getData is a great resource for easy to access administrative polygons. Here we grab county polygons for the entire United States and then whittle that down to only the Texas counties. Within the piping ecosystem, the . symbol represents the entire object resulting from the left, allowing us to index it like normal but without typing out CNTY.

CNTY
## class       : SpatialPolygonsDataFrame 
## features    : 254 
## extent      : -106.6458, -93.50874, 25.83808, 36.50344  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 15
## names       : OBJECTID, ID_0, ISO,        NAME_0, ID_1, NAME_1, ID_2,   NAME_2,   HASC_2, CCN_2, CCA_2, TYPE_2, ENGTYPE_2, NL_NAME_2, VARNAME_2 
## min values  :     2528,  244, USA, United States,   44,  Texas, 2528, Anderson, US.TX.AA,    NA,      , County,    County,          ,           
## max values  :     2781,  244, USA, United States,   44,  Texas, 2781,   Zavala, US.TX.ZV,    NA,      , County,    County,          ,
#254 features (the number of counties in the Republic of Texas)
class(CNTY)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

A SpatialPolygonsDataFrame unsurprisingly behaves very similarly to the SpatialPointsDataFrame (don’t run attributes on this unless you want to scroll around a bunch). The different slots that can be accessed by @ are data, polygons, plotOrder, bbox, and proj4string (I used the attributes function and scrolled for your sins).

But where were we?

OVER <- over(TEX, CNTY)
head(OVER, 20)
##    OBJECTID ID_0  ISO        NAME_0 ID_1 NAME_1 ID_2    NAME_2   HASC_2
## 1      2768  244  USA United States   44  Texas 2768   Wharton US.TX.WN
## 2      2553  244  USA United States   44  Texas 2553  Burleson US.TX.BU
## 3      2553  244  USA United States   44  Texas 2553  Burleson US.TX.BU
## 4      2548  244  USA United States   44  Texas 2548    Brazos US.TX.BZ
## 5      2725  244  USA United States   44  Texas 2725 Robertson US.TX.RO
## 6      2548  244  USA United States   44  Texas 2548    Brazos US.TX.BZ
## 7      2548  244  USA United States   44  Texas 2548    Brazos US.TX.BZ
## 8      2548  244  USA United States   44  Texas 2548    Brazos US.TX.BZ
## 9      2548  244  USA United States   44  Texas 2548    Brazos US.TX.BZ
## 10     2548  244  USA United States   44  Texas 2548    Brazos US.TX.BZ
## 11     2548  244  USA United States   44  Texas 2548    Brazos US.TX.BZ
## 12     2548  244  USA United States   44  Texas 2548    Brazos US.TX.BZ
## 13     2548  244  USA United States   44  Texas 2548    Brazos US.TX.BZ
## 14       NA   NA <NA>          <NA>   NA   <NA>   NA      <NA>     <NA>
## 15     2563  244  USA United States   44  Texas 2563  Chambers US.TX.CH
## 16     2603  244  USA United States   44  Texas 2603    Fisher US.TX.FH
## 17     2603  244  USA United States   44  Texas 2603    Fisher US.TX.FH
## 18     2603  244  USA United States   44  Texas 2603    Fisher US.TX.FH
## 19     2603  244  USA United States   44  Texas 2603    Fisher US.TX.FH
## 20     2620  244  USA United States   44  Texas 2620    Grimes US.TX.GM
##    CCN_2 CCA_2 TYPE_2 ENGTYPE_2 NL_NAME_2 VARNAME_2
## 1     NA       County    County                    
## 2     NA       County    County                    
## 3     NA       County    County                    
## 4     NA       County    County                    
## 5     NA       County    County                    
## 6     NA       County    County                    
## 7     NA       County    County                    
## 8     NA       County    County                    
## 9     NA       County    County                    
## 10    NA       County    County                    
## 11    NA       County    County                    
## 12    NA       County    County                    
## 13    NA       County    County                    
## 14    NA  <NA>   <NA>      <NA>      <NA>      <NA>
## 15    NA       County    County                    
## 16    NA       County    County                    
## 17    NA       County    County                    
## 18    NA       County    County                    
## 19    NA       County    County                    
## 20    NA       County    County

sp::over is a wonderful function that checks for records/attributes that are overlayed in both spatial objects you give it. It is a good way of grabbing only those data that exist in the geographic space of another object (e.g., you don’t want disgusting marine mammals ruining your Texas mammal dataset) Starship Troopers voice “Would you like to know more?” there are vignettes on the sp::over function if you yearn for knowledge/are afflicted by insomnia.

Grab only those points that are actually within Texas.

TEX <- TEX[-which(is.na(OVER[, 1])), ]

But did it work?

plot(TEX, pch = 19) #+ maps::map(add = T, database = "county", "texas")

Yes! (Hopefully it works for you as well, if not ¯_(ツ)_/¯). The maps::map function can add quick boundaries, the default setting is set to “world” but it can be switched to “state” or “county”

ggplot2 again!

ggplot() + geom_point(data = as.data.frame(TEX), aes(x = decimalLongitude, y = decimalLatitude), col = "#880000")

Gasp, colored points! Gasp, geom_point creates points! You’ll notice that we changed TEX back into a data.frame (because ggplot2 is high maintenance).

Points were cool, but now let’s use the ggplot2::geom_polygon function to add the polygon data. Briefly, all SpatialPolygonDataFrame objects require an x of long, y of lat, and group of group. I gave it no fill (via NA) and a dark grey border (take note that the border doesn’t show up in the legend because it is outside of the aes function)

MAP <- ggplot() + geom_polygon(data = CNTY, aes(x = long, y = lat, group = group), fill = NA, col = "grey20") + geom_point(data = as.data.frame(TEX), aes(x = decimalLongitude, y = decimalLatitude, col = "#880000"), pch = 19, alpha = 0.5)
## Regions defined for each Polygons
MAP

That legend doesn’t say very much.

MAP + scale_color_manual(name = "Legend", values = "#880000", labels = "Occurences") + theme(legend.position = c(0.2, 0.2), legend.background = element_blank())

When you are using the color (or fill, shape, size, etc.) aesthetic, you can change the legend using the ggplot2::scale_color_manual (or switch color for what you are interested in). Name changes the title, labels changes the legend labels, and values changes the actual color.

What if we wanted to see what counties aren’t represented by georeferenced BRTC specimens? What if we wanted to plot TWO polygons?

NORE <- CNTY@data %$% NAME_2 %>% setdiff(unique(TEX$county))

I used a pipeline to grab the county name (NAME_2) from the CNTY polygons and then checks for differences between the list of counties and those in our points dataset.

NORE <- CNTY[CNTY$NAME_2 %in% NORE, ]
NORE
## class       : SpatialPolygonsDataFrame 
## features    : 55 
## extent      : -103.9819, -94.65408, 28.81567, 36.50344  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 15
## names       : OBJECTID, ID_0, ISO,        NAME_0, ID_1, NAME_1, ID_2, NAME_2,   HASC_2, CCN_2, CCA_2, TYPE_2, ENGTYPE_2, NL_NAME_2, VARNAME_2 
## min values  :     2532,  244, USA, United States,   44,  Texas, 2532, Archer, US.TX.AC,    NA,      , County,    County,          ,           
## max values  :     2758,  244, USA, United States,   44,  Texas, 2758,  Upton, US.TX.UT,    NA,      , County,    County,          ,
plot(NORE)

MAP + geom_polygon(data = NORE, aes(x = long, y = lat, group = group), fill = "gray50", col = "gray20")

ggplot2 acts as a series of layers, so the missing county layer goes on top. If you want it below everything, put it further upstream in the ggplot2 code.

Raster

Now we are going to move to the raster package to deal with raster data

ELEV <- raster::getData(name = "alt", country = "USA")[[1]]

We are using our good friend the raster::getData function to grab some elevation data for the US (the [[1]] is there because the result without it is a list of 4 RasterLayers and we only want what applies to the continental US).

ELEV
## class       : RasterLayer 
## dimensions  : 3012, 6948, 20927376  (nrow, ncol, ncell)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : -124.8, -66.9, 24.4, 49.5  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84 
## data source : /Users/Adrian/Desktop/USA1_msk_alt.grd 
## names       : USA1_msk_alt 
## values      : -171, 4275  (min, max)

You can see that it is a RasterLayer with a ton of cells. You can also see the resolution, extent, crs, and min/max values for the elevation.

plot(ELEV) #Yep, thats the continental US (and the Rocky Mountains!)

What can you do with rasters?

raster::extract(ELEV, TEX@coords)

As you can see, we used the raster::extract function to grab the elevation of the cell that each specimen matches to. Neat!

You can crop the raster to a certain geographic extent.

EXT <- extent(CNTY)
EXT
## class       : Extent 
## xmin        : -106.6458 
## xmax        : -93.50874 
## ymin        : 25.83808 
## ymax        : 36.50344

Here, we just grabbed the geographic extent of the CNTY polygons using the raster::extent function (which basically just puts the CNTY@bbox info in a different object class). This isn’t really needed, but it can be a useful function for defining a study area.

plot(crop(ELEV, EXT))
plot(crop(ELEV, CNTY))

Both return the same result (the lack of Texas elevation) Butwhy.gif

?crop mentions that the y argument is an Extent object or something that can be used to create one. This is your other reminder that the help pages, their description of the arguments, and the examples at the bottom are EXTREMELY useful. Please read them.

You can also clip the raster to a polygon, but this can take a bit more time

Run the below line and then stretch or sit quietly, paralyzed by existential dread.

ELEV <- mask(crop(ELEV, CNTY), CNTY)
plot(ELEV)

I know y’all missed Texas, so I brought it back.

Now base plot is nice, but how do you plot rasters in ggplot2? You could use a combination of rasterVis::gplot and ggplot2::geom_tile, but that may lead to plotting issues. Using raster::rasterToPoints can be slow, but will plot everything.

ELEV <- as.data.frame(rasterToPoints(ELEV))
names(ELEV)[3] <- "value" 
ggplot(ELEV) + geom_raster(aes(x = x, y = y, fill = value)) 

As you can see, it filled the plot in according to the value. Don’t like blue?

RAS <- ggplot(ELEV) + geom_tile(aes(x = x, y = y, fill = value)) + scale_fill_viridis(option = "D", name = "Elevation (m)", na.value = "grey95") 
RAS

Well, isn’t that nice? The viridis package makes the viridis color scheme usable in R by scale_XXX_viridis. You may have noticed the option argument in scale_fill_viridis. Check out what happens if you change it to “A”, “B”, or “C”

Let’s add some polygons (and someone asked for contours, so let’s add that in too).

RAS + geom_polygon(data = CNTY, aes(x = long, y = lat, group = group), fill = NA, col = "grey20") + geom_contour(data = ELEV, aes(x = x, y = y, z = value), binwidth = 200, col = "white", alpha = 0.6) 

There you go. An elevation map of Texas counties with contour lines for every 200m increase in elevation.